100 research outputs found
Recognition times for 54 thousand Dutch words : data from the Dutch crowdsourcing project
We present a new database of Dutch word recognition times for a total of 54 thousand words, called the Dutch Crowdsourcing Project. The data were collected with an Internet vocabulary test. The database is limited to native Dutch speakers. Participants were asked to indicate which words they knew. Their response times were registered, even though the participants were not asked to respond as fast as possible. Still, the response times correlate around .7 with the response times of the Dutch Lexicon Projects for shared words. Also results of virtual experiments indicate that the new response times are a valid addition to the Dutch Lexicon Projects. This not only means that we have useful response times for some 20 thousand extra words, but we now also have data on differences in response latencies as a function of education and age. The new data correspond better to word use in the Netherlands
A plea for more interactions between psycholinguistics and natural language processing research
A new development in psycholinguistics is the use of regression analyses on tens of thousands of words, known as the megastudy approach. This development has led to the collection of processing times and subjective ratings (of age of acquisition, concreteness, valence, and arousal) for most of the existing words in English and Dutch. In addition, a crowdsourcing study in the Dutch language has resulted in information about how well 52,000 lemmas are known. This information is likely to be of interest to NLP researchers and computational linguists. At the same time, large-scale measures of word characteristics developed in the latter traditions are likely to be pivotal in bringing the megastudy approach to the next level
Percepcija tipiÄnosti u leksikonu: tipiÄnost oblika rijeÄi, leksiÄka gustoÄa i morfonotaktiÄka ograniÄenja
The extent to which a symbolic timeāseries (a sequence of sounds or letters) is a typical
word of a language, referred to as WORDLIKENESS, has been shown to have effects in speech
perception and production, reading proficiency, lexical development and lexical access,
shortāterm and longāterm verbal memory. Two quantitative models have been suggested to
account for these effects: serial phonotactic probabilities (the likelihood for a given symbolic
sequence to appear in the lexicon) and lexical density (the extent to which other words can
be obtained from a target word by changing, deleting or inserting one or more symbols
in the target). The two measures are highly correlated and thus easy to be confounded in
measuring their effects in lexical tasks. In this paper, we propose a computational model
of lexical organisation, based on SelfāOrganising Maps with Hebbian connections defined
over a temporal layer (TSOMs), providing a principled algorithmic account of effects of
lexical acquisition, processing and access, to further investigate these issues. In particular,
we show that (morphoā)phonotactic probabilities and lexical density, though correlated in
lexical organisation, can be taken to focus on different aspects of speakersā word processing
behaviour and thus provide independent cognitive contributions to our understanding of
the principles of perception of typicality that govern lexical organisation.Pokazano je da stupanj do kojeg je odreÄeni simboliÄki vremenski slijed (slijed zvukova ili slova)
tipiÄna rijeÄ u jeziku, odnosno TIPIÄNOST OBLIKA RIJEÄI, ima uÄinaka u proizvodnji i percepciji
govora, uspjeÅ”nosti Äitanja, leksiÄkom razvoju i pristupu leksemima te kratkotrajnoj i dugotrajnoj
verbalnoj memoriji. Predložena su dva kvantitativna modela kako bi se objasnili navedeni uÄinci:
serijalne fonotaktiÄke vjerojatnosti (vjerojatnost pojavljivanja odreÄenog simboliÄkog slijeda u
leksikonu) i leksiÄka gustoÄa (mjera do koje se druge rijeÄi mogu proizvesti zamjenom, brisanjem
ili umetanjem jednog ili viÅ”e simbola u ciljnu rijeÄ). Te dvije mjere visoko koreliraju, zbog Äega su
teÅ”ko razdvojive pri mjerenju njihovih uÄinaka u leksiÄkim zadacima. U ovom radu predlažemo
raÄunalni model leksiÄke organizacije koji pruža sustavan algoritamski prikaz uÄinaka leksiÄkog
usvajanja, obrade i pristupa kako bi se dodatno istražila ova pitanja. Taj se model temelji na
samoorganizirajuÄim mapama s hebijanskim vezama definiranim preko vremenske razine (engl.
TSOMs). Posebice pokazujemo da se (morfo-)fonotaktiÄke vjerojatnosti i leksiÄka gustoÄa, iako
korelirani u leksiÄkoj organizaciji, mogu shvatiti kao naÄini usredotoÄavanja na razliÄite aspekte
govornikova ponaÅ”anja pri obradi rijeÄi i tako pružiti nezavisne kognitivne doprinose naÅ”em
razumijevanju principa percepcije i tipiÄnosti koji upravljaju leksiÄkom organizacijom
Percepcija tipiÄnosti u leksikonu: tipiÄnost oblika rijeÄi, leksiÄka gustoÄa i morfonotaktiÄka ograniÄenja
The extent to which a symbolic timeāseries (a sequence of sounds or letters) is a typical
word of a language, referred to as WORDLIKENESS, has been shown to have effects in speech
perception and production, reading proficiency, lexical development and lexical access,
shortāterm and longāterm verbal memory. Two quantitative models have been suggested to
account for these effects: serial phonotactic probabilities (the likelihood for a given symbolic
sequence to appear in the lexicon) and lexical density (the extent to which other words can
be obtained from a target word by changing, deleting or inserting one or more symbols
in the target). The two measures are highly correlated and thus easy to be confounded in
measuring their effects in lexical tasks. In this paper, we propose a computational model
of lexical organisation, based on SelfāOrganising Maps with Hebbian connections defined
over a temporal layer (TSOMs), providing a principled algorithmic account of effects of
lexical acquisition, processing and access, to further investigate these issues. In particular,
we show that (morphoā)phonotactic probabilities and lexical density, though correlated in
lexical organisation, can be taken to focus on different aspects of speakersā word processing
behaviour and thus provide independent cognitive contributions to our understanding of
the principles of perception of typicality that govern lexical organisation.Pokazano je da stupanj do kojeg je odreÄeni simboliÄki vremenski slijed (slijed zvukova ili slova)
tipiÄna rijeÄ u jeziku, odnosno TIPIÄNOST OBLIKA RIJEÄI, ima uÄinaka u proizvodnji i percepciji
govora, uspjeÅ”nosti Äitanja, leksiÄkom razvoju i pristupu leksemima te kratkotrajnoj i dugotrajnoj
verbalnoj memoriji. Predložena su dva kvantitativna modela kako bi se objasnili navedeni uÄinci:
serijalne fonotaktiÄke vjerojatnosti (vjerojatnost pojavljivanja odreÄenog simboliÄkog slijeda u
leksikonu) i leksiÄka gustoÄa (mjera do koje se druge rijeÄi mogu proizvesti zamjenom, brisanjem
ili umetanjem jednog ili viÅ”e simbola u ciljnu rijeÄ). Te dvije mjere visoko koreliraju, zbog Äega su
teÅ”ko razdvojive pri mjerenju njihovih uÄinaka u leksiÄkim zadacima. U ovom radu predlažemo
raÄunalni model leksiÄke organizacije koji pruža sustavan algoritamski prikaz uÄinaka leksiÄkog
usvajanja, obrade i pristupa kako bi se dodatno istražila ova pitanja. Taj se model temelji na
samoorganizirajuÄim mapama s hebijanskim vezama definiranim preko vremenske razine (engl.
TSOMs). Posebice pokazujemo da se (morfo-)fonotaktiÄke vjerojatnosti i leksiÄka gustoÄa, iako
korelirani u leksiÄkoj organizaciji, mogu shvatiti kao naÄini usredotoÄavanja na razliÄite aspekte
govornikova ponaÅ”anja pri obradi rijeÄi i tako pružiti nezavisne kognitivne doprinose naÅ”em
razumijevanju principa percepcije i tipiÄnosti koji upravljaju leksiÄkom organizacijom
Corpus linguistics
The first comprehensive guide to research methods and technologies in psycholinguistics and the neurobiology of language Bringing together contributions from a distinguished group of researchers and practitioners, editors Annette M. B. de Groot and Peter Hagoort explore the methods and technologies used by researchers of language acquisition, language processing, and communication, including: traditional observational and behavioral methods; computational modelling; corpus linguistics; and virtual reality. The book also examines neurobiological methods, including functional and structural neuroimaging and molecular genetics. Ideal for students engaged in the field, Research Methods in Psycholinguistics and the Neurobiology of Language examines the relative strengths and weaknesses of various methods in relation to competing approaches.Ā It describes the apparatus involved, the nature of the stimuli and data used, and the data collection and analysis techniques for each method. Featuring numerous example studies, along with many full-color illustrations, this indispensable text will help readers gain a clear picture of the practices and tools described.Ā Brings together contributions from distinguished researchers across an array of related disciplines who explain the underlying assumptions and rationales of their research methods Describes the apparatus involved, the nature of the stimuli and data used, and the data collection and analysis techniques for each method Explores the relative strengths and weaknesses of various methods in relation to competing approaches Features numerous real-world examples, along with many full-color illustrations, to help readers gain a clear picture of the practices and tools describe
Assessing the Usefulness of Google Booksā Word Frequencies for Psycholinguistic Research on Word Processing
In this Perspective Article we assess the usefulness of Google's new word frequencies for word recognition research (lexical decision and word naming). We find that, despite the massive corpus on which the Google estimates are based (131 billion words from books published in the United States alone), the Google American English frequencies explain 11% less of the variance in the lexical decision times from the English Lexicon Project (Balota et al., 2007) than the SUBTLEX-US word frequencies, based on a corpus of 51 million words from film and television subtitles. Further analyses indicate that word frequencies derived from recent books (published after 2000) are better predictors of word processing times than frequencies based on the full corpus, and that word frequencies based on fiction books predict word processing times better than word frequencies based on the full corpus. The most predictive word frequencies from Google still do not explain more of the variance in word recognition times of undergraduate students and old adults than the subtitle-based word frequencies
Which words do English non-native speakers know? New supernational levels based on yes/no decision
To have more information about the English words known by second language (L2) speakers, we ran a large-scale crowdsourcing vocabulary test, which yielded 17 million useful responses. It provided us with a list of 445 words known to nearly all participants. The list was compared to various existing lists of words advised to include in the first stages of English L2 teaching. The data also provided us with a ranking of 61,000 words in terms of degree and speed of word recognition in English L2 speakers, which correlated r = .85 with a similar ranking based on native English speakers. The L2 speakers in our study were relatively better at academic words (which are often cognates in their mother tongue) and words related to experiences English L2 students are likely to have. They were worse at words related to childhood and family life. Finally, a new list of 20 levels of 1,000 word families is presented, which will be of use to English L2 teachers, as the levels represent the order in which English vocabulary seems to be acquired by L2 learners across the world
- ā¦